### A Pluto.jl notebook ###
# v0.20.4

using Markdown
using InteractiveUtils

# ╔═╡ e89ac452-28e1-4f30-808d-468d52fddba7
import Pkg; Pkg.activate(".")

# ╔═╡ 11e8bdad-39b4-41b7-a75d-484412b76190
begin
	using CairoMakie
	using CommonMark
	using DataFrames
	using Distributions
	using FITSIO
	using Format
	using Optim
	using PairPlots
	using PlutoUI
	using PValue
	using Random
	import StatsPlots
	using Turing
end

# ╔═╡ 4fce939b-8e4d-48d9-9300-67e065f490f7
md"""
**What is this?**


*This jupyter notebook is part of a collection of notebooks on various topics discussed during the Time Domain Astrophysics course delivered by Stefano Covino at the [Università dell'Insubria](https://www.uninsubria.eu/) in Como (Italy). Please direct questions and suggestions to [stefano.covino@inaf.it](mailto:stefano.covino@inaf.it).*
"""

# ╔═╡ b7946586-1b5f-4d29-867f-bd41e49610bf
md"""
**This is a `Julia` notebook**
"""

# ╔═╡ 276b9b1a-95e8-410d-bb54-049142ac0cec
Pkg.instantiate()

# ╔═╡ 9368f133-0234-454b-b396-b7c0c37d3e95
# ╠═╡ show_logs = false
md"""
$(LocalResource("Pics/TimeDomainBanner.jpg"))
"""

# ╔═╡ 4560e210-bd7f-4f28-b0ba-88291e851272
# ╠═╡ show_logs = false
md"""
# Statistics Reminder
***

$(LocalResource("Pics/Bayes.png", :width=>700))

- Bayes’ theorem brings in the concept of "degree of belief" into hard statistical modelings. Bayes’ rule is the only mechanism that can be used to gradually update the probability of an event as the evidence or data is gathered sequentially.

- There is an intriguing line of though that describes the law of probability as the natural scenario to formalize the logic of science. See, for instance, [Cox's theorem](https://en.wikipedia.org/wiki/Cox%27s_theorem).

$(LocalResource("Pics/knowledgecycle.png", :width=>700))
"""

# ╔═╡ 5cdb5c3c-eab5-4e95-a9da-f4a08ce7ddbf
cm"""
### Bayes' theorem: example of application
***

- Given a disease, `D`, with a *prevalence* 10%, and a test, `T`, with a *sensitivity* 95% and *specificity* 85%, what is the probability that subjects who test positive for `D` really is affected by `D`?

- Data:
	- p(D=1) = 0.1, 1.5          *prevalence*

	- p(T=1 | D=1) = 0.95		*sensitivity*
	- p(T=0 | D=0) = 0.85       *specificity*

```math
p(D=1|T=1) = \frac{p(T=1|D=1)p(D=1)}{p(T=1|D=1)p(D=1) + p(T=1|D=0)p(D=0)} = 
```

```math
= \frac{0.95 \times 0.1}{0.95 \times 0.1 + 0.15 \times 0.9} = 0.413
```

- Indeed, a test of modest reliability.
"""

# ╔═╡ 1a2ed703-b276-4e5d-8e50-c329cc5b83a5
md"""
## Information criteria and hypothesis testing
***

- In a Bayesian scenario everything we can know given the data and the prior information is stated by the Bayes’ Theorem:

```math
p(M,\theta|D,I) = \frac{p(D|M,\theta,I)p(M,\theta|I)}{p(D|I)}
```

- The posterior probability for model $M$ can be computed by the posterior distribution of the parameters after having marginalized (i.e. integrated) over the parameter space.

- Assuming to have two (or more) competing models, computing the *evidence*, the denominator of the previous expression, we can evaluate the **odd ratio**:

```math
O_{21} \equiv \frac{p(M_2|D,I)}{p(M_1|D,I}
```

- This is a very sophisticated approach but, often, computationally hard to apply. There are simpler tools, although with limitations and assumptions.


> A full discussion of hypothesis testing in a Bayesian scenario is a very interesting topic, but definitely beyiond the purpose of the corse. Nevertheless, during the course, we will often mention this matter. 

- There are simpler alternatives, although with further assumptions and limitations. 
    - However, they require to compute only the maxium likelihodood, and not the whole posterior distribution of paramaters.

- A well known criterion is the [Bayesian (or Schwarz) Information Criterior](https://en.wikipedia.org/wiki/Bayesian_information_criterion) (BIC):

```math
BIC \equiv -2 \log [L^0(M)] + k \log N
```

- where $L^0[M]$ is the naximum likelihood for model $M$, $k$ the number of model parameters and $N$ is the number of data points.

- Another frequently applied criterion is the [Akaike Information Criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) (AIC):

```math
AIC \equiv -2 \log [L^0(M)] + sk + \frac{2k(k+1)}{N-k-1}
```

- When multiple models are compared, the one with the smallest AIC or BIC is the best model to select. If the models are equally successful in describing the data (they have the same value of $L^0[M]$, then the model with fewer free parameters "wins".

"""

# ╔═╡ 7ebe7987-94e9-480f-b948-c3aa0f70d5d1
begin
	# Lest's define two models for a given dataset
	
	f1(x;a=1,b=0) = a.*x .+ b
	f2(x;c=1,a=1,b=0) = c.*x.^2 + a.*x .+ b
	
	χ21(prs) = sum((f1(x,a=prs[1],b=prs[2]) .- y).^2 ./ σ.^2)
	χ22(prs) = sum((f2(x,c=prs[1],a=prs[2],b=prs[3]) .- y).^2 ./ σ.^2)
	
	
	fg1 = Figure()
	
	
	# data
	x = [0.96,1.95,2.93,3.98,4.97,6.07,7.02,8.06,8.94,10.05]
	y = [-3.99,25.48,52.17,49.88,44.745,65.00,74.76,69.23,103.26,100.37]
	σ = [9.53,10.12,9.46,8.86,9.80,9.56,9.69,9.35,9.97,9.48]
	
	
	x01 = [10.0,5.0]
	res1 = optimize(χ21, x01)
	
	x02 = [1.,10.0,5.0]
	res2 = optimize(χ22, x02)
	
	
	prs1 = Optim.minimizer(res1)
	prs2 = Optim.minimizer(res2)
	
	printfmtln("ML for model 1 = {:.2f} and model 2: {:.2f}",χ21(prs1),χ22(prs2))
	    
	bc1 = BIC(χ21(prs1),length(x),2)
	bc2 = BIC(χ22(prs2),length(x),3)
	
	printfmtln("BIC for model 1 = {:.2f} and model 2: {:.2f}",bc1,bc2)
	
	
	
	ax1fg1 = Axis(fg1[1, 1])
	
	scatter!(x,y,color=:blue)
	errorbars!(x,y,σ,color=:blue)
	lines!(0:12,f1(0:12,a=prs1[1],b=prs1[2]),color=:orange,label="line")
	lines!(0:12,f2(0:12,c=prs2[1],a=prs2[2],b=prs2[3]),color=:green,label="parabola")
	
	xlims!(0,12)
	ylims!(-20,120)
	
	axislegend()
	
	fg1
	
	
end

# ╔═╡ c0ec48fd-c4f3-4d6d-a854-ef49ead7d512
md"""
- In spite of the decrease of the residuals the simpler fit is considerably preferred $\Delta BIC \sim 4$ (let's remember it is a logarithmic scale).

- If we introduce the [decibel](https://en.wikipedia.org/wiki/Decibel) unit, i.e. we have $\Delta BIC \sim 18$ dB.

- It is also possible to show that, under several hypotheses, $\log Evidence \approx -BIC/2$.
"""

# ╔═╡ ae7c719d-2f46-41d8-8053-b017d6cebdbb
md"""
- Let's try an example of Bayes' theorem application from astronomy: 

## Computing the distance of an open cluster
***

- We infer the distance to a star from a parallax measurement. The true but unknown distance $d$ in kiloparsecs ($Kpc$) is related to the true but unknown parallax $π$ in milliarcseconds ($mas$) through:

    - $d [\rm{kpc}] = \frac{1}{π} [\rm {mas}]$

- We are going to use parallax measurements with some fixed uncertainty, $σ$, that we treat as known. 

- In the Bayesian framework, we wish to infer the parameter $d$ given the data drawingthe posterior distribution of our parameters.

```math
p(d|y) \sim p(y|d)p(d)
```

### Data
***

- Let's load in the data for the NGC2682 ([M67](https://en.wikipedia.org/wiki/Messier_67)) open star cluster
"""

# ╔═╡ 0ab98a92-0c00-4c6d-86c1-82a5e053060f
begin
	hdu = FITS("NGC_2682.fits")

	
	# extract membership probabilities from kinematics (Gaia DR2)
	pmem = read(hdu[2],"HDBscan_MemProb")
	
	# extract parallaxes (Gaia DR2)
	# note: no corrections have been applied for any systematics
	p = read(hdu[2],"Parallax")
	pe = read(hdu[2],"Parallax_Err")
	
	# And select only objects with some probability to be part of the cluster.
	sel = pmem .> 0.3
end;

# ╔═╡ 8d6b9987-b153-41a0-b2f4-b87f51ddf0be
begin
	μp, σp = WeightedArithmeticMean(p[sel],pe[sel])
	
	printfmtln("Parallax weighted mean: {:.3f} mas and error on the mean: {:.3f} mas", μp, σp)
end

# ╔═╡ 438da9b4-83e8-4e99-81ea-2a474877f841
begin
	fg2 = Figure()
	
	ax1fg2 = Axis(fg2[1, 1],
	    xlabel="Parallax (mas)")
	
	hist!(p,bins=50,label="Full sample")    
	hist!(p[sel],bins=50,label="More likely membership")
	
	axislegend()
	
	fg2
end

# ╔═╡ e672614a-6de7-4fc8-9621-3f1939a89e24
md"""
## Field star parallax measurements
***

- Before considering the (actually simpler) case of parallaxes for stars in a cluster, let's see how a Bayesian analysis for measurement of field stars (i.e. not in a cluster) could be carried out.
"""

# ╔═╡ 11a01b9a-6955-4bf7-a35d-85cb5ebbe243
md"""
### The likelihood
***

- we may adopt a regular Gaussian likelihood for the measured distances, i.e. $\mathcal{L} \sim N(1/d,σ^2)$, i.e.:

$$p(y|d) = \frac{1}{\sqrt{2πσ^2}} e^{-\frac{(y − 1/d)^2}{2σ^2}}$$


### The Priors
***

- As for any Bayesian analysis, we have to define the prior information we have. And we are going to realize that this is a far from trivial problem.

- We have three different possible choices, although only the last is describing correctly our problem:

    - A uniform distribution for the parallax.
    - A uniform distribution for the distance.
    - A physically motivated prior for the space distributon of stars we are studying.
"""

# ╔═╡ 11d08a83-330e-4caa-931a-0500700fd215
md"""
- While the prior for distance above may appear non-informative (in a sense that it is uniform), it actually encodes a strong assumption about the number density of stars $ρ$ as a function of distance. 
    - The prior implies that we are just as likely to observe stars at large distances as we are at smaller distances. 
    - However, as we look out into space, the area of the solid angle defined by the distance $d$ increases, and this in turn implies that the stellar number density is decreasing with
distance.
"""

# ╔═╡ 763d157a-33eb-4aba-a278-1ee55a7f26bd
md"""
- [Bailer-Jones et al. (2018)](https://iopscience.iop.org/article/10.3847/1538-3881/aacb21) introduced a better prior for the parallax inference problem.

- The physical volume $dV$ probed by an infinitesimal solid angle on the sky $dΩ$ at a given distance $d$ scales as the size of a shell so that $dΩ \propto d^2$. This means that, assuming a constant stellar number density $ρ$ everywhere, a prior behaving as $p(d) \propto d^2$ is more appropriate.

- However, we know that our Sun sits in the disk of the Galaxy, and that the actual stellar density $ρ$ as we go radially outward in the disk should decrease as a function of distance. 

- Assuming we are looking outward, and that the stellar density decreases exponentially with a length scale $L$ (so that for a given distance we have $p(ρ|d) \propto e^{−d/L}$) the prior on distance is:

```math
p(d) \propto \begin{cases}
d^2 e^{−d/L} & d_{\rm min} < d < d_{\rm max}\\
0 & {\rm otherwise}
\end{cases}
```

- which is the density function of a truncated [Gamma(3, L)](https://en.wikipedia.org/wiki/Gamma_distribution) distribution.


"""

# ╔═╡ 82cd1ab2-0f25-4cc6-969c-e79c7d395320
md"""
> A uniform distribution in parallax is not a uniform distrbution in distance, $d = 1/\pi$. Actually, if $\pi$ is uniformly distributed between $a$ and $b$, the [inverse](https://en.wikipedia.org/wiki/Inverse_distribution) is distributed as $1/d^2$ between $b^{-1}$ and $a^{-1}$.
"""

# ╔═╡ 67b0f7d8-1111-4c50-90a4-991de8f2db23
md"""
> It is clear how uniform priors are usually far from being uninformative.
"""

# ╔═╡ 2a2f37cf-b026-4b9a-b45e-adf13d00c8a2
md"""
### Posterior distribution
***

- Once the prior distribution(s) and the likelihood function are specified, the posterior distribution is uniquely determined. 
    - Often, the denominator quantity $p(y)$ is not available analytically and can be estimated by numerical integration.

- The posterior distribution enables inference of model parameters or of quantities that can be derived from model parameters. 
    - For example, the posterior mean $\mathcal{E}(θ | y)$ is a point summary for $θ$. The posterior distribution can also be used to define credible intervals for parameters that provide a range of probable values.

> We stress that credible intervals are not confidence intervals; a 95\% credible interval suggests that there is a 95\% probability that the parameter lies within the specified range given our prior beliefs, the model, and the data, whereas a 95\% confidence interval suggests that if similar intervals are properly constructed for multiple datasets then 95\% of them are expected to contain the true (fixed) parameter value.

- The posterior distribution also provides a useful way to obtain estimates and credible intervals for other quantities of physical interest.
    - For example, if a model has parameters $θ = (α, β, κ)$ and there is some physical quantity described by e.g., $γ = β^2 /αe^κ$, then for every sample of $θ$, a sample of $γ$ can be calculated. Thus, a distribution of the physically interesting quantity $γ$ is obtained, which can also be used to obtain point estimates and credible intervals.
"""

# ╔═╡ 52a8e283-1ce9-422a-a395-31aca182be57
md"""
- Here, we are interested in inferring the parameter for the distance $d = 1/π$ given the measured parallax $y$ and its known associated measurement uncertainty $σ$.

- Bayes’ theorem, the posterior is:

```math
p(d|y) \sim p(y|d)p(d)
```

For the three possible priors discussed previously, and recalling that if $f$ is uniformly distributed the PDF of the reciprocal, $y=1/f$, is $\sim \frac{1}{y^2}$, we have:

- Uniform parallax:
```math
p(d|y) \sim \frac{1}{d^2} e^{−\frac{(y − 1/d)^2}{2σ^2}}.
```

- Uniform distance:
```math
p(d|y) \sim e^{−\frac{(y − 1/d)^2}{2σ^2}}.
```

- Bailer-Jones:
```math
p(d|y) \sim d^2 e^{−\frac{d}{L}-\frac{(y − 1/d)^2}{2σ^2}}.
```

- for $d_{\rm min} < d < d_{\rm max}$ (and $0$ otherwise). 

$(LocalResource("Pics/posterior.png"))
"""

# ╔═╡ 1e6940a0-1b11-4ced-ae11-b016088f1414
md"""
## Cluster (M67) distance
***

- What we have written so far holds for a single mesurement. However, for the M67 distance, we work with a set of $n$ measuremnts for stars that we might now think are located at approximately the same distance $d_{\rm cluster}$ with measured parallaxes $y = {y_1, y_2, . . . , y_n}$, that we consider independent given $d_{\rm cluster}$.

- For each measurement the Gaussian likelihood can still be applied, and the combined likelihood is the product of the individual likelihoods, again assuming the measurement uncertainties, $σ_i$, be known:

```math
p(y_1, y_2, . . . , y_n | 1/d_{\rm cluster}) = \prod_{i=1}^{n} p(y_i |1/d_{\rm cluster})
```

- The prior information about the distance of the cluster (from past measurements, etc.) si now a simple Normal distribution, e.g. $p(d_{\rm cluster}) \sim \mathcal{N}(\mu=1.2,\sigma=0.1)$. 

- Finally, the full posterior turns out to be:

```math
p(d_{\rm cluster}|y) \sim p(d_{\rm cluster}) \prod_{i=1}^{n} p(y_i |1/d_{\rm cluster})
```

- Now, since the prior PDF do not (of course) depend on the measurements the only factor affected by the product is $\prod_{i=1}^{n} \exp (−\frac{(y − 1/d_{\rm cluster}))^2}{2σ^2})$, and since the product of $n$ independent Gaussian densities with known variances is a Gaussian with precision parameter $τ^{−2} = \sum^n_{i=1} σ_i^{−2}$ and mean parameter $1/d_{\rm cluster}$, defining, $y_{eff} = τ^2 \sum^n_{i=1} y_i/σ^2_i$ we can write:

```math
p(d_{\rm cluster}|y) \sim p(d_{\rm cluster}) e^{-\frac{(y_{\rm eff}− 1/d_{\rm cluster})^2}{2τ^2}}
```
    
- As mentioned above, please note that a prior over $p(d_{\rm cluster})$, which governs the distribution of clusters of stars, is not the same as a prior over $p(d)$, which governs the distribution of individual stars.
"""

# ╔═╡ de5d832e-6bc5-41fb-8712-c61b9ed98cf2
begin
	# Let's compute τ and yeff
	
	t = sum(pe[sel].^(-2))
	τ2 = 1 ./ t
	
	yeff = τ2 .* sum(p[sel] ./ pe[sel].^2)
	
	
	# The prior
	pdcl = Normal(1.2,0.1)
	
	
	
	fg3 = Figure()
	
	ax1fg3 = Axis(fg3[1, 1],
	    title="Posterior plot",
	    xlabel="Distance (Kpc)",
	    )
	
	
	pstr(x) = pdf(pdcl,x) .* exp.( -((yeff .- (1 ./ x)).^2)/(2τ2))
	
	rng = range(start=0.87,stop=0.895,step=1e-4)
	
	dtt = pstr(rng)
	
	lines!(rng, dtt ,label="posterior")
	
	axislegend()
	
	#ylims!(0,0.5)
	#xlims!(0,6)
	
	fg3
end

# ╔═╡ e801975e-c374-4474-8b65-eda9179cb740
md"""
## Analysis of the posterior distribution of parameters
***

- The previous example is one-dimensional and involving only Gaussians can be studies analytically. However, as soon as the number of parameters grows, direct exploration of the posterior PDF becomes impractical.

- One needs sampling tools. The most widely used is the Markov Chain Monte Carlo (MCMC), yet there are several alternatives.

- Having produced a posterior $p(\theta)$, in general we might want to estimate integrals as:

```math
I(\theta) = \int g(\theta) p(\theta) d\theta
```

- e.g. for marginalization $g(θ)=1$, average $g(θ)=θ$, credible intervals, etc.

"""

# ╔═╡ 06dae479-7348-47e4-9229-f8ff39254bdd
md"""
### Markov Chain Monte Carlo
***

- When a direct evaluation of an integral, numerically or analytically, is impossible a Monte Carlo approach is often feasible. 

- Let’s generate $M$ values of the parameter set uniformly sampled within the integration volume $V_θ$. The integral turns out to be: 

```math
I \approx \frac{V_\theta}{M} \sum_{j=1}^M g(\theta_j) p(\theta_j)
```

- The algorithm works, but it is very inefficient in particular for high-dimensional integral.

- MCMC methods return a sample of points, or chain, from the k-dimensional parameter space, with a distribution that is asymptotically proportional to $p(θ)$. With such a chain the integral becomes:

```math
I \approx \frac{1}{M} \sum_{j=1}^M g(\theta_j)
```

- E.g., to estimate the expectation value for $θ$ (i.e., $g(θ) = θ$), we simply take the mean value of all $θ$ in the chain.

> A Markov chain is a sequence of random variables where a given value depends only on its preceding value. Given the present value, past and future values are independent. The process generating such a chain is called the Markov process.

- A "daily life" (irony here) example of Markov chain could be a model of a baby's behavior, you might include "playing," "eating", "sleeping," and "crying" as states, which together with other behaviors could form a 'state space': a list of all possible states. 
- In addition, on top of the state space, a Markov chain tells you the probabilitiy of hopping, or "transitioning," from one state to any other state, e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first, etc.

- Coming back to "serious staff", the process generating such a chain is called the Markov process and can be described as: 

```math
p(θ_{i+1}|\{θ_i\}) = p(θ_{i+1}|θ_i)
```

    - i.e., it depends only on the previous step (and it is called "of the first order").
    
- To reach an equilibrium, or stationary, distribution of positions, it is necessary that the transition probability be symmetric: 

```math
p(θ_{i+1}|θ_i) = p(θ_i|θ_{i+1})
```

- There are many algorithms for producing Markov chains that reach some prescribed equilibrium distribution, $p(θ)$, and this is a field of very active research.

> Interactive demo: [https://chi-feng.github.io/mcmc-demo/](https://chi-feng.github.io/mcmc-demo/)
"""

# ╔═╡ b477227b-42fa-40dc-ab6d-e1c7a7ac3edf
md"""
### A probabilistic model
***

- Let's now write a simple probabilistic model to fit a straight line.
"""

# ╔═╡ 3ab330b1-3071-4729-ab09-98286ef82573
begin
	data = Array([[ 0.42,  0.72,  0.  ,  0.3 ,  0.15,
	                   0.09,  0.19,  0.35,  0.4 ,  0.54,
	                   0.42,  0.69,  0.2 ,  0.88,  0.03,
	                   0.67,  0.42,  0.56,  0.14,  0.2  ],
	                 [ 0.33,  0.41, -0.22,  0.01, -0.05,
	                  -0.05, -0.12,  0.26,  0.29,  0.39,
	                   0.31,  0.42, -0.01,  0.58, -0.2 ,
	                   0.52,  0.15,  0.32, -0.13, -0.09 ],
	                 [ 0.1 ,  0.1 ,  0.1 ,  0.1 ,  0.1 ,
	                   0.1 ,  0.1 ,  0.1 ,  0.1 ,  0.1 ,
	                   0.1 ,  0.1 ,  0.1 ,  0.1 ,  0.1 ,
	                   0.1 ,  0.1 ,  0.1 ,  0.1 ,  0.1  ]])
	x2, y2, ey2 = data
	
	
	# And let's plot them
	
	fg4 = Figure()
	ax4fg4 = Axis(fg4[1, 1],
	    xlabel = "X",
	    ylabel = "Y"
	    )
	scatter!(x2,y2,label="Observations")
	errorbars!(x2,y2,ey2)
	axislegend(ax4fg4)
	fg4
end

# ╔═╡ abfb9e39-f691-4471-9099-d7be3309d6d9
begin
	# Declare our Turing model
	@model function jmodel(x,ey,y)
	    # Our prior belief
	    theta ~ Uniform(-0.5 * π, 0.5 * π)
	    q_perp ~ Normal(0.,5.)
	    #
	    m = tan.(theta)
	    q = q_perp./cos.(theta)
	    #
	    #for i in 1:length(y)
	    #    y[i] ~ Normal.(m*x .+ q,ey)
	    #end
	    return y ~ MvNormal(m*x .+ q,ey)
	end
	
	# Please, pay attention how we write the parameters of the models.
	# This allows us to use simple priors!
	
	# Let's sample our model with 2000 iterations, 1000 of them as burn-in, 
	#and 4 different chains.
	
	iterations = 2000
	burnins = 1000
	chains = 4
	
	# We apply a No-U-Turn sampler
	
	chain = mapreduce(c -> sample(jmodel(x2,ey2,y2), NUTS(burnins,0.65), iterations), chainscat, 1:chains)
	
end

# ╔═╡ 0b176a7f-b0ad-4a63-963c-80433eee24e0
StatsPlots.plot(chain)

# ╔═╡ 1e8e2db8-d432-4385-9709-e16ea9197e13
begin
	# Let's produce a popular "cornerplot"
	
	dfc = DataFrame(chain)     
	
	dfc[!,:q_post] = dfc[!,:q_perp]./cos.(dfc[!,:theta])
	dfc[!,:m_post] = tan.(dfc[!,:theta])
	          
	dfcs = select(dfc, [:m_post,:q_post])
	
	fig = pairplot(dfcs, labels = Dict(
	        :m_post => "m",
	        :q_post => "q",    
	    ))
end

# ╔═╡ 628265ac-116a-4fe0-8808-b95c266df1ab
begin
	# And, finally, let's show the posterior superposed to the data.
	
	fg5 = Figure()
	ax5fg5 = Axis(fg5[1, 1],
	    xlabel = "X",
	    ylabel = "Y"
	    )
	scatter!(x2,y2,label="Observations")
	errorbars!(x2,y2,ey2)
	
	for i in rand(1:nrow(dfc),100)
	    yc = dfc[i,:m_post]*x2.+dfc[i,:q_post]
	    lines!(x2,yc,color=:orange,alpha=0.1)
	end
	axislegend(ax5fg5)
	fg5
end

# ╔═╡ 03e24e91-f126-4009-96f4-a32419615dc0
md"""
- Asn we know, in classical statistics, models are evaluted by hypothesis testing, i.e. a null hypothesis (e.g., that the data do not contain a signal of interest) is rejected if, under the null, the probability of observing data as extreme or more extreme than what has been observed is small (the so-called p-value).

- An alternative viewpoint is offered by a Bayesian approach to probability, in which the focus is shifted away from rejecting a null hypothesis to comparing alternative explanations. Again as we have seen already, this is done by means of the posterior odds (ratio of probabilities) between two (or several) competing models.

- Nevertheless, without entering in a long-lasting discussion, there is the possibility to compute a *Bayesian p-values* to a model, as discussed in [Lucy (2016)](https://ui.adsabs.harvard.edu/abs/2016A%26A...588A..19L/abstract).
"""

# ╔═╡ 942af7a7-6bc1-495c-831c-bcd43472d849
md"""
## Reference & Material

Material and papers related to the topics discussed in this lecture.

- [Trotta (2017) - "Bayesian Methods in Cosmology”](https://ui.adsabs.harvard.edu/abs/2017arXiv170101467T/abstract)
- [Karamanis (2023) - "Bayesian Computation in Astronomy: Novel methods for parallel and gradient-free inference (Chapter 1-3)"](https://ui.adsabs.harvard.edu/abs/2023arXiv230316134K/abstract)
"""

# ╔═╡ 972bdb87-8cb3-48f1-ae0d-23c6bbe63f20
md"""
## Further Material

Papers for examining more closely some of the discussed topics.

- [E.T. Jaynes - "Probability Theory: The Logic of Science"](https://www.ibs.it/probability-theory-logic-of-science-libro-inglese-e-t-jaynes/e/9780521592710?gad_source=1&gclid=CjwKCAjwmYCzBhA6EiwAxFwfgKa47QGwGtjBEL2-Kl5nPVBIlUPFMw2X-Ic0HPba0KyKQIfqqUFygxoCuqIQAvD_BwE)
- [A. Gelman et al. - "Bayesian Data Analysis"](http://www.stat.columbia.edu/~gelman/book/)
- [J. Vandeplas (2014) - "Frequentism and Bayesianism: A Python-driven Primer"](https://ui.adsabs.harvard.edu/abs/2014arXiv1411.5018V/abstract)
- [L.B. Lucy (2016) - "Frequentist tests for Bayesian models"](https://ui.adsabs.harvard.edu/abs/2016A%26A...588A..19L/abstract)
"""

# ╔═╡ 1febcfb8-6d86-4b39-aa5e-0d31247f3ee4
md"""
### Credits
***

This notebook contains material obtained by [https://arxiv.org/pdf/2302.04703 and https://github.com/joshspeagle/nrp_astrobayes](https://arxiv.org/pdf/2302.04703 and https://github.com/joshspeagle/nrp_astrobayes).
"""

# ╔═╡ 75ce499b-fd01-4517-a0b3-aeeefa55a790
cm"""
## Course Flow

<table>
  <tr>
    <td>Previous lecture</td>
    <td>Next lecture</td>
  </tr>
  <tr>
    <td><a href="./open?path=Lectures/Lecture - Statistics Reminder/Lecture-StatisticsReminder.jl">Reminder of frequentist statistics</a></td>
    <td><a href="./open?path=Lectures/Lecture - Statistics Reminder/Lecture - Spectral Analysis/Lecture-SpectralAnalysis.jl">Lecture about spectral analysis</a></td>
  </tr>
 </table>


"""

# ╔═╡ b73a652f-0c96-456a-97c8-efb0493ef130
md"""
**Copyright**

This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the examples, unless obtained from other properly quoted sources, under the [MIT license](https://opensource.org/licenses/MIT). Please attribute the work as follows: *Stefano Covino, Time Domain Astrophysics - Lecture notes featuring computational examples, 2025*.
"""

# ╔═╡ Cell order:
# ╟─4fce939b-8e4d-48d9-9300-67e065f490f7
# ╟─b7946586-1b5f-4d29-867f-bd41e49610bf
# ╠═e89ac452-28e1-4f30-808d-468d52fddba7
# ╠═276b9b1a-95e8-410d-bb54-049142ac0cec
# ╠═11e8bdad-39b4-41b7-a75d-484412b76190
# ╟─9368f133-0234-454b-b396-b7c0c37d3e95
# ╟─4560e210-bd7f-4f28-b0ba-88291e851272
# ╟─5cdb5c3c-eab5-4e95-a9da-f4a08ce7ddbf
# ╟─1a2ed703-b276-4e5d-8e50-c329cc5b83a5
# ╠═7ebe7987-94e9-480f-b948-c3aa0f70d5d1
# ╟─c0ec48fd-c4f3-4d6d-a854-ef49ead7d512
# ╟─ae7c719d-2f46-41d8-8053-b017d6cebdbb
# ╠═0ab98a92-0c00-4c6d-86c1-82a5e053060f
# ╠═8d6b9987-b153-41a0-b2f4-b87f51ddf0be
# ╠═438da9b4-83e8-4e99-81ea-2a474877f841
# ╟─e672614a-6de7-4fc8-9621-3f1939a89e24
# ╟─11a01b9a-6955-4bf7-a35d-85cb5ebbe243
# ╟─11d08a83-330e-4caa-931a-0500700fd215
# ╟─763d157a-33eb-4aba-a278-1ee55a7f26bd
# ╟─82cd1ab2-0f25-4cc6-969c-e79c7d395320
# ╟─67b0f7d8-1111-4c50-90a4-991de8f2db23
# ╟─2a2f37cf-b026-4b9a-b45e-adf13d00c8a2
# ╟─52a8e283-1ce9-422a-a395-31aca182be57
# ╟─1e6940a0-1b11-4ced-ae11-b016088f1414
# ╠═de5d832e-6bc5-41fb-8712-c61b9ed98cf2
# ╟─e801975e-c374-4474-8b65-eda9179cb740
# ╟─06dae479-7348-47e4-9229-f8ff39254bdd
# ╟─b477227b-42fa-40dc-ab6d-e1c7a7ac3edf
# ╠═3ab330b1-3071-4729-ab09-98286ef82573
# ╠═abfb9e39-f691-4471-9099-d7be3309d6d9
# ╠═0b176a7f-b0ad-4a63-963c-80433eee24e0
# ╠═1e8e2db8-d432-4385-9709-e16ea9197e13
# ╠═628265ac-116a-4fe0-8808-b95c266df1ab
# ╟─03e24e91-f126-4009-96f4-a32419615dc0
# ╟─942af7a7-6bc1-495c-831c-bcd43472d849
# ╟─972bdb87-8cb3-48f1-ae0d-23c6bbe63f20
# ╟─1febcfb8-6d86-4b39-aa5e-0d31247f3ee4
# ╟─75ce499b-fd01-4517-a0b3-aeeefa55a790
# ╟─b73a652f-0c96-456a-97c8-efb0493ef130
